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Abstract 

The approximation of fixed-interval smoothing distributions is a key 
issue in inference for general state-space hidden Markov models (HMM). 
This contribution establishes non-asymptotic bounds for the Forward Fil- 
tering Backward Smoothing (FFBS) and the Forward Filtering Backward 
Simulation (FFBSi) estimators of fixed-interval smoothing functionals. 
We show that the rate of convergence of the Lq-mean errors of both meth- 
ods depends on the number of observations T and the number of particles 
A'^ only through the ratio T/N for additive functionals. In the case of 
the FFBS, this improves recent results providing bounds depending on 

1 Introduction 

State-space models play a key role in statistics, engineering and econometrics; 
see [IIITTIIT^. Consider a process {Xt]t>[) taking values in a general state-space 
X. This hidden process can be observed only through the observation process 
{^t}t>o taking values in Y. Statistical inference in general state-space models 
involves the computation of expectations of additive functionals of the form 

T 
t=l 

conditionally to {Yfj^Q, where T is a positive integer and {ht}J^i are func- 
tions defined on X^. These smoothed additive functionals appear naturally for 
maximum likelihood parameter inference in hidden Markov models. The com- 
putation of the gradient of the log-likelihood function (Fisher score) or of the 
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intermediate quantity of the Expectation Maximization algorithm involves the 
estimation of such smoothed functionals, see j2l Chapter 10 and 11] and |10j . 

Except for linear Gaussian state-spaces or for finite state-spaces, these smoothed 
additive functionals cannot be computed explicitly. In this paper, we consider 
Sequential Monte Carlo algorithms, henceforth referred to as particle methods, 
to approximate these quantities. These methods combine sequential importance 
sampling and sampling importance resampling steps to produce a set of random 
particles with associated importance weights to approximate the fixed-interval 
smoothing distributions. 

The most straightforward implementation is based on the so-called path- 
space method. The complexity of this algorithm per time-step grows only lin- 
early with the number N of particles, see |3]. However, a well-known shortcom- 
ing of this algorithm is known in the literature as the path degeneracy; see [10] 
for a discussion. 

Several solutions have been proposed to solve this degeneracy problem. In 
this paper, we consider the Forward Filtering Backward Smoothing algorithm 
(FFBS) and the Forward Filtering Backward Simulation algorithm (FFBSi) 
introduced in (S] and further developed in [T3] . Both algorithms proceed in two 
passes. In the forward pass, a set of particles and weights is stored. In the 
Backward pass of the FFBS the weights are modified but the particles are kept 
fixed. The FFBSi draws independently different particle trajectories among all 
possible paths. Since they use a backward step, these algorithms are mainly 
adapted for batch estimation problems. However, as shown in |5j, when applied 
to additive functionals, the FFBS algorithm can be implemented forward in 
time, but its complexity grows quadratically with the number of particles. As 
shown in jB], it is possible to implement the FFBSi with a complexity growing 
only linearly with the number of particles. 

The control of the Lg-norm of the deviation between the smoothed additive 
functional and its particle approximation has been studied recently in In 
an unpublished paper by |6] , it is shown that the FFBS estimator variance of any 
smoothed additive functional is upper bounded by terms depending on T and 
N only through the ratio T/N. Furthermore, in |5], for any q > 2, a L^-mean 
error bound for smoothed functionals computed with the FFBS is established. 
When applied to strongly mixing kernels, this bound amounts to be of order 
T/VN either for 

(i) uniformly bounded in time general path-dependent functionals, 

(ii) unnormalized additive functionals (see |5j Eq. (3.8), pp. 957]). 

In this paper, we establish L^-mean error and exponential deviation inequal- 
ities of both the FFBS and FFBSi smoothed functionals estimators. We show 
that, for any q > 2, the L^-mean error for both algorithms is upper bounded 
by terms depending on T and N only through the ratio T/N under the strong 
mixing conditions for ^ and (|n|. We also establish an exponential deviation 
inequality with the same functional dependence in T and N . 
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This paper is organized as follows. Section [2] introduces further definitions 
and notations and the FFBS and FFBSi algorithms. In Section|3] upper bounds 
for the Lg-mean error and exponential deviation inequalities of these two algo- 
rithms are presented. In Section[4] some Monte Carlo experiments are presented 
to support our theoretical claims. The proofs are presented in Sections [5] and [6] 



2 Framework 

Let X and Y be two general state-spaces endowed with countably generated 
(T-fields X and y. Let M be a Markov transition kernel defined on X x A" and 
{9t}t>o a family of functions defined on X. It is assumed that, for any a; G X, 
M{x, •) has a density ■m{x, •) with respect to a reference measure A on (X, X). 
For any integers T > and < s < i < T, any measurable function h on 
X*"*"*"^, and any probability distribution x on (X, A"), define 

, dci J x{<ixo)go{xo)l\l^-^^M{xu-i,dxu)gu{xu)h{xs,t) 

/ x(dxo)5io(a;o)n«=i M{xu-i,dxu)guixu) 

where short-hand notation for {asj^^u- The dependence on go-x is 

implicit and is dropped from the notations. 

Remark 1. Note that this equation has a simple interpretation in the particular 
case of hidden Markov models. Indeed, let {il, J^, P) be a probability space and 
{Xt}t>o a Markov chain on (il, P) with transition kernel M and initial distri- 
bution X (which we denote Xq ~ Let {Yt}t>o be a sequence of observations 
on (r2,J^, P) conditionally independent given a{Xt,t > 0) and such that the 
conditional distribution of given cr(X(, t > 0) has a density given by g{Xu, •) 
with respect to a reference measure on 3^ and set gu{x) = g{x,Yu). Then, the 
quantity 4>s:t\T[h] defined by ([T]) is the conditional expectation of h{Xs:t) given 

4>,.,t\T[h]^nh{Xs:t)\ya:T] , Xo^X- 

In its original version, the FFBS algorithm proceeds in two passes. In the 
forward pass, each filtering distribution (j>t '= (pt-.t, for any t G {0, ...,r}, is 
approximated using weighted samples (w^ ' , ' ) r i where T is the num- 
ber of observations and the number of particles: all sampled particles and 
weights are stored. In the backward pass of the FFBS, these importance weights 
are then modified (see [HI HSl US]) while the particle positions are kept fixed. 
The importance weights are updated recursively backward in time to obtain an 
approximation of the fixed-interval smoothing distributions {4's:T\t} The 
particle approximation is constructed as follows. 



Forward pass Let {£,Q'^}eLi be i.i.d. random variables distributed accord- 
ing to the instrumental density pQ and set the importance weights l^q'^ 
dx/'ipoiS.o'^) 9o{£,q'^)- The weighted sample {{£,o'^,i^o'^)}eLi then targets the 
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initial filter (^q in the sense that (f>Q[h] J2eLi^Q'^^i^o^'^)/J2^=i^o'^ ^ 
consistent estimator of (/jq [h] for any bounded and measurable function h on X. 
Let now {{^^_l\,uj^l\)}^^i be a weighted sample targeting (ps-i- We aim at com- 
puting new particles and importance weights targeting the probability distribu- 
tion (ps- Following [17 , this may be done by simulating pairs {(^i^'^, Ci^'^)}^^! 
of indices and particles from the instrumental distribution: 

7r,|,(^, h) cx M , 

on the product space {!,... ,N} x X, where adjustment 
multiplier weights and Ps is a Markovian proposal transition kernel. In the 
sequel, we assume that Ps{x,-) has, for any a; G X, a density ps(x,-) with 
respect to the reference measure A. For any £ £ {!,..., TV} we associate to the 
particle ^^'^ its importance weight defined by: 



N,i def m(^fi"",e'Ogs(e-^) 
UJ 



l}s{^s-l )Ps{L-l '6 ) 

Backward smoothing For any probability measure ry on (X, A"), denote by 
B.,, the backward smoothing kernel given, for all bounded measurable function 
/i on X and for all a; G X, by: 

„ . dof /??(da;') m{x',x)h{x') 
Br,(a;,/i) = . , , — i-r-^ — , 
J ri{ax') m(x' ,x) 

For all s €: {0, . . . , T— 1} and for all bounded measurable function h on X^~*+'^, 
4's:T\T[h] may be computed recursively, backward in time, according to 

</'s:T|t[/i] = J B^^{Xs+l,dXs)4's+l:T\Tid^Xs+l;T)h{Xs:T) ■ 

2.1 The forward filtering backward smoothing algorithm 

Consider the weighted samples (^^ ' , ' ) > , drawn for any t e {0, . . . ,T} 

in the forward pass. An approximation of the fixed-interval smoothing distri- 
bution can be obtained using 



^s:T\T 



h] = j {xs+iAxs) (/)^i.y|y(da;s+i:T) Kxs-.t) , (2) 



and starting with — (^^[ft,]. Now, by definition, for all .t G X and for 

all bounded measurable function h on X, 

t=i Lfci'^s m{^s ,x) 
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and inserting this expression into (|2| gives the following particle approximation 
of the fixed-interval smoothing distribution (/'o:T|t[^] 



N N / T \ AT,,;, 



where /i is a bounded measurable function on X-^'+^j 



and 

The estimator of the fixed-interval smoothing distribution <f>Q.rp^rp might seem 
impractical since the cardinality of its support is iV-^+^. Nevertheless, for addi- 
tive functionals of the form 

T 

ST,rixo:T) = ^ ^ti^t-r:t) , (6) 

where r is a non negative integer and {ht}f^j. is a family of bounded measurable 
functions on the complexity of the FFBS algorithm is reduced to 0(7V''+^). 

Furthermore, the smoothing of such functions can be computed forward in time 
as shown in [S] . This forward algorithm is exactly the one presented in [TU] as an 
alternative to the use of the path-space method. Therefore, the results outlined 
in Section [3] hold for this method and confirm the conjecture mentioned in |10| . 



2.2 The forward filtering backward simulation algorithm 

We now consider an algorithm whose complexity grows only linearly with the 
number of particles for any functional on X"^"*"^. For any t G {1, • ■ • ,T}, we 
define 

j,N drf ^{(^^N.^^N.y^ 0<S<t,l<^<N} . 

The transition probabilities {A^}^^ defined in Q induce an inhomogeneous 
Markov chain {Ju}u=o evolving backward in time as follows. At time T, the 
random index Jt is drawn from the set {1, . . . , N} with probability proportional 
to (wj,'^, . . . , ujrp ' ). For any t £ {0, . . . , T— 1}, the index Jt is sampled in the set 
{1, . . . , A''} according to {Jt+i, •). The joint distribution of Jq-t is therefore 
given, for jo-.r G {1, . • . , N}'^+^ , by 

N.jr 

P[Jo:T^Mt\:F^] = %^A^_,{jT,jT-i)...A^{ji,Jo) ■ (7) 
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Thus, the FFBS estimator ([s]) of the fixed-interval smoothing distribution may 
be written as the conditional expectation 



N 



where /i is a bounded measurable function on X"^+^ . We may therefore construct 
an unbiased estimator of the FFBS estimator given by 



N 

1=1 



■ • J ST 



(8) 



where {JQ.rp}^^i are TV paths drawn independently given J-"^ according to Q 
and where /i is a bounded measurable function on X^"*"^. This practical esti- 
mator was introduced in [T^ (Algorithm 1, p. 158). An implementation of this 
estimator whose complexity grows linearly in N is introduced in ||8j. 



3 Non-asymptotic deviation inequalities 

In this Section, the Lg-mean error bounds and exponential deviation inequalities 
of the FFBS and FFBSi algorithms are established for additive functionals of 
the form Our results are established under the following assumptions. 

Al (i) There exists ((T_,cr+) G (0,oo)^ such that ct- < (7+ and for any 

{x,x') € X^, CT_ < m{x,x') < (7+ and we set p'= 1 — O-jaj^. 

(ii) There exists c_ € such that ^ xiAx^g^ix) > c_ and for any 
t€W, inf.ex / M{x,dx')gt{x') > c^. 

A2 (i) For alH > and all x e X, gt{x) > 0. 

(ii) sup|gt|oo < oo. 
t>o 

A3 supji^tloo < oo, sup|pt|oo < oo and suplwjoo < oo where 
t>i t>o t>o 

, . dcf , . , , , dcf m{x,x')gt{x') 
u:,{x) = —{x)g,{xl u;,ix,x ) = j^^^^^^^t > 1 . 

Assumptions and give bounds for the model and assumption A[3] for 
quantities related to the algorithm. referred to as the strong mixing 

condition, is crucial to derive time-uniform exponential deviation inequalities 
and a time-uniform bound of the variance of the marginal smoothing distribution 
(see [7] and [5]). For all function h from a space E to M, osc(ft-) is defined by: 

osc{h) =^ sup \h{z) — h{z')\ . 
{z.z')eE^ 



6 



Theorem 1. Assume j4|ij-|5[ For all q > 2, there exists a constant C (depending 

only on q, a^, a+, c_, sup|??f|oo o,nd sup|W(|ooJ such that for any T < oo, any 
t>i t>o 

integer r and any bounded and measurable functions {^s}^=r' 



^0:T\T 



[ST,r] - 0O:T|T [StA < SrJrT ( E 0Sc(/l,)2 ) 

where Sr^r is defined by (j6]), <Po-t\t defined by ([3| and where 

VTT^VT -r + l 



1/2 



N 



Similarly, 



1/2 



where 4>l^.rp,rp is defined by (|8| 



Remark 2. In the particular cases where r = and r = T, T^^ = l + y^TTT/N 
and T^,^ = Vr+ 1(1 + a/T + 1/7V). Then, Theorem [l] gives 



'^0:T|T ['5't,o] — '/'0:T|T [St,o] 



< c 



1/2 



T+ 1 
AT 



and 



yO:T|T 



< c 



r + l 



r + l 



osc(/it)^ • 



As stated in Section [T] theses bounds improve the results given in ^ for the 
FFBS estimator. 



Remark 3. The dependence on 1/vN is hardly surprising. Under the stated 
strong mixing condition, it is known that the Lg-norm of the marginal smoothing 



estimator 



^[h], t (z {r, . . . , T} is uniformly bounded in time by 



°t-r:t\T 



h] 



Cosc{h)N^^/'^ (where C depends only on q, (7_, cr_|_, c_, supji^ijoo and suplw^oo) 

t>i t>o 



< 



The dependence in vT instead of T reflects the forgetting property of the filter 
and the backward smoother. As for r < s < i < T, the estimators 4>f^r-s\Ti^s] 
and (t't^^r-t\T\-^t\ become asymptotically independent as {t — s) gets large, the 
Lq-norm of the sum Et=r "^t^r tlTl^*] scales as the sum of a mixing sequence 



(see m- 
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Remark 4. It is easy to see that the scahng in y^T/N cannot in general be 
improved. Assume that the kernel m satisfies m{x, x') = m{x') for all (x, x') G 
X X X. In this case, for any t E {0, . . . ,T}, the filtering distribution is 

, r, 1 J m{x)gt{x)ht{x)dx 

nin-tl = f — t \ ^ \A ' 

J m(x)gt{x)ax 

and the backward kernel is the identity kernel. Hence, the fixed-interval smooth- 
ing distribution coincides with the filtering distribution. If we assume that we 
apply the bootstrap filter for which Ps{x,x') = m{x') and ^s{x) = 1, the esti- 
mators {(?!>^y [/it]}tg{o,...,T} ^re independent random variables corresponding to 
importance sampling estimators. It is easily seen that 



T 

< C max |osc(ft.t)} \ — 
- o<t<T ^ V N 
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Remark 5. The independent case also clearly illustrates why the path-space 
methods are sub-optimal (see also [T] for a discussion). When applied to the 
independent case (for all {x,x') G X x X, m{x,x') = m{x') and Ps(a;,a;') = 
m{x')), the asymptotic variance of the path-space estimators is given in [1] by 

^0:T\T[STfi] 

dcf^^ m{g'^) m{gt[ht - M^t)?) , m{9T[f^T - (/^Tihr)?) 



^2 

T-1 (t-1 



™(5t)^ m{gt) m{gj 



EE 



m{gf) m{g,[h, - (j3,{h,)Y) m{g^[ht - (j)t{ht)r) 



t=i u=o™*^^*)^ "^^^'^ rnigt)^ j 

The asymptotic variance thus increases as and hence, under the stated as- 
sumptions, the variance of the path-space methods is of order T'^/N. It is 
believed (and proved in some specific scenarios) that the same scaling holds for 
path-space methods for non-degenerated Markov kernel (the result has been for- 
mally established for strongly mixing kernel under the assumption that cr_ / (T+ 
is sufficiently close to 1). 

We provide below a brief outline of the main steps of the proofs (a detailed 
proof is given in Section |5| . Following \E\ , the proofs rely on a decomposition 
of the smoothing error. For all < t < T and all bounded and measurable 
function h on X^+i define the kernel U.t : x X'^'^+i _^ jq^ ^ 

dcf f ^ 

^t,Thixo:t) / Yi ^^ixu-i,dXu)gu{xu)Hxo.,T) ■ 

•' u=t+l 
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The fixed- interval smoothing distribution may then be expressed, for all bounded 
and measurable function h on X"^"*"^, by 

, r, 1 <^0:t|t [U,Th] 
(pO:T\T[h\ = 7^ -T , 

and this suggests to decompose the smoothing error as follows 



A^[/l] = .^StT|TW-'/'0:T|TW (9) 
T 

= E 



^ dt[Lt,rfe] _ C-i|t-i [U-i,Th] 

_ ,/,JV FT . _11 ^JV 



where we used the convention 



;ion 

[L-i,t/i] _ [Lo,t/i] _ , r, 1 



Furthermore, for all < f < T, 

(l>0:t\t [Lt,T/j] = j (t>0:t\Mx0:t)U,Th{X0:t) 

= J ct>^{dxt)jC^^Th{xt) , 

where C^j- and £(_t are two kernels on X x defined for all a;t € X by 

C.t,Th{xt)^= j B^^_^{xt,dxt-i)---'Q4>o{xudxo)ht,Th{xo:t) (10) 
CtTh{xt)'^^ j B^N_^{xt,dxt-i)- ■■B^N{xi,dxo)ljt^Th{xo:t) ■ (11) 

For all 1 < t < T we can write 

hN rT ._11 aN ft . . „i1 AN\rN 11 j,iV r/'JV 



LNirN y LNirN 



and then. 



t=o -'^ 2^e=i^t ^t,Ti-{t,t ) 
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with Gfj- is a kernel on X x X®'''^'^'^^ defined, for all xt G X and all bounded 
and measurable function h on X"'"^^, by 



't-l [-^t-l.T-'-] 



where, by the same convention as above. 



dcf - 



0o[^o,Tfe] 
00 [^o.ri] 



Two families of random variables {C'/^(/)}^_^ and are now 

introduced to transform (12 1 into a suitable decomposition to compute an up- 
per bound for the L^-mean error. As shown in Lemma [T] the random vari- 



ables {wf '^Gfjn/(^™''^)}^-^ are centered given J^^i- The idea is to replace 



^^=1'-^" ''^ ^t,T'i-{^t' ''^) in (12 1 by its conditional expectation given J-^^i to 
get a martingale difference. This conditional expectation is computed using the 
following intermediate result. For any measurable function ft, on X and any 

te {o,...,r}. 



E 



N.lif^N.ls 



x-JV 



f-1 [Mgth] 



Indeed. 



(13) 



E 



E 



N 
t-1 



77 Af 



N 



^ti [Mgth] 



This result, applied with the function h — Ct t^, yields 



E 



N,l 



-pN 

t-1 



ei [MgtCt,Tl] 



For any < ^ < T'j define for all bounded and measurable function h on 
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D^j,{h) = E 



dcf - 



t.TJ 



l^t.Tllo 



N 



N 



"1-1 N (^N U(f:N,e^ 
^ > '^f — r^i , 



e=i 



Hit ) 



IA,t1| 



(14) 



dcf 



N- 



l-Ct,rl|o 



[|£t,Tl|oo 



AT 

N,l^t,T 



G 



Mit ) 



«=1 



Using these notations. (|12| can be rewritten as follows: 



T 



T 



t=0 



(15) 



(16) 



For any q > 2, the derivation of the upper bound relies on the triangle inequality: 



T,r\ 



< 



t=0 



||C't^T('S'T,r)| 



t=0 



where ST,r is defined in ([6|. The proof for the FFBS estimator v^q tit 
pleted by using Proposition [T| and Proposition [2] According to ( 16 1, the smooth- 
ing error can be decomposed into a sum of two terms which are considered sep- 
arately. The first one is a martingale whose Lg-mean error is upper-bounded by 
y^(T~TT)^A/V as shown in Proposition 111 The second one is a sum of products, 
Lg-norm of which being bounded by l/N in Proposition |2] 

The end of this section is devoted to the exponential deviation inequality 
for the error A^[ST,r] defined by ([9|. We use the decomposition of A^[ST,r] 
obtained in (16 1 leading to a similar dependence on the ratio (T -I- l)/iV. The 
martingale term Dfj,{ST,r) is dealt with using the Azuma-HoefFding inequality 
while the term C^rp{ST,r) needs a specific Hoeffding-type inequality for ratio of 
random variables. 



Theorem 2. Assume ^^^^ There exists a constant C (depending only on (t_, 

CT-l-, r, c_, sup|z?t|oo o,nd sup|wj|oo/' such that for any T < co, any N > 1, any 
t>i t>o 

£ > 0, any integer r, and any bounded and measurable functions {hs}"^^^! 




CNe^ 



8 cxp — 



CNe 



(l + '^)E.=.osc(/i,), 
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where ST,r is defined by (|6|, 4'q-t\t defined by ([3| and whe 



Qr,T (1 + r) {(1 + r) A (T - r + 1)} . (17) 

Similarly, 



' ||</'0:T|T [ST,r] 

< 4exp - 



- 4>0:T\T [^T,r] > 

__CNe^__\ 



CNe 



s exp 



(l + '^)E.=.osc(M, 



where 4>^.rp^rp is defined by (|8| 



4 Monte- Carlo Experiments 

In this section, the performance of the FFBSi algorithm is evaluated through 
simulations and compared to the path-space method. 

4.1 Linear gaussian model 

Let us consider the following model: 

Xt+i = (t)Xt + duUt , 
Yt =Xt + a,Vt , 

where Xq is a zero-mean random variable with variance j^r^, {f^tloo and 
{Vi}j>Q are two sequences of independent and identically distributed standard 
gaussian random variables (independent from Xq). The parameters ((/>, cr„, dt,) 
are assumed to be known. Observations were generated using (j) = 0.9, tT„ = 
0.6 and tr^ = 1. Table [l] provides the empirical variance of the estimation of 

the unnormalized smoothed additive functional It = X)t=o ■'^ given 
by the path-space and the FFBSi methods over 250 independent Monte Carlo 
experiments. We display in Figure [T] the empirical variance for different values 
of as a function of T for both estimators. These estimates are represented 
by dots and a linear regression (resp. quadratic regression) is also provided for 
the FFBSi algorithm (resp. for the path-space method). 

In Figure |2] the FFBSi algorithm is compared to the path-space method 
to compute the smoothed value of the empirical mean {T + For the 

purpose of comparison, this quantity is computed using the Kalman smoother. 
We display in Figure [2] the box and whisker plots of the estimations obtained 
with 100 independent Monte Carlo experiments. The FFBSi algorithm clearly 
outperforms the other method for comparable computational costs. In Table 
[2] the mean CPU times over the 100 runs of the two methods are given as a 
function of the number of particles (for T = 500 and T = 1000). 
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Table 1: Empirical variance for different values of T and N. 



Path- space 
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35.6 
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483.2 


326.4 
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89.7 
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9.7 
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11.2 


7.1 


4.9 


3.7 
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16.5 


10.5 


6.7 


5.1 
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25.6 


14.1 


7.8 
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Table 2: Average CPU time to compute the smoothed value of the empirical 
mean in the LGM 



T = 500 


FFBSi 


Path-space method 


N 

CPU time (s) 


500 
4.87 


500 5000 10000 
0.24 2.47 4.65 




T = 1000 


FFBSi 


Path-space method 


N 

CPU time (s) 


1000 
16.5 


1000 10000 20000 
0.9 8.5 17.2 



4.2 Stochastic Volatility Model 

Stochastic volatility models (SVM) have been introduced to provide better ways 
of modeling financial time series data than ARCH/GARCH models (fijj). We 
consider the elementary SVM model introduced by |14| : 

jXt+i = (j>Xt + aUt+i , 
\Yt=l3e^Vt , 

where Xq is a zero-mean random variable with variance j^^, {Ut}t>o 
{Vt}j>Q are two sequences of independent and identically distributed standard 
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Figure 1: Empirical variance of the path-space (top) and FFBSi (bottom) for 
N = 300 (dotted line), N = 750 (dashed Hne) and N ^ 1500 (bold hne). 



gaussian random variables (independent from Xq). This model was used to 
generate simulated data with parameters (0 = 0.3, a — 0.5, (3 = 1) assumed to 
be known in the following experiments. The empirical variance of the estimation 
of Xt given by the path-space and the FFBSi methods over 250 independent 
Monte Carlo experiments is displayed in Table |3] We display in Figure |3] the 
empirical variance for different values of as a function of T for both estimators. 



5 Proof of Theorem [T] 

We preface the proof of Proposition ^ by the following Lemma: 



Lemma 1. Under assumptions we have, for any t G {0, . . . , T} and any 

measurable function h on X"^+^.' 



TV 

(i) The random variables uj"''^ — ^^^^ — '- ^ are, for all N £'N: 

1=1 



14 



-0.1 



-0.15 



0.45 



0.4 



True value = -0.0469 



GcneQ.logical tree Genealogical tree Genealogical tree FFBSl 
N = 5()U .Y = 5000 .Y = 10000 N = 500 



(a) Time T = 500 



True value = 0.3983 



Genealogical tree Genealogical tree Genealogical tree FFBSt 
N = 1000 N = 10000 N = 20000 iV = 1000 

(b) Time T = 1000 

Figure 2: Computation of smoothed additive functionals in a linear gaussian 
model. The variance of the estimation given by the FFBSi algorithm is the 
smallest one in both cases. 



Table 3: Empirical variance for different values of T and N in the SVM. 
Path- space method 





300 


500 


750 


1000 


1500 


5000 


10000 


15000 


20000 


300 


52.7 


33.7 


22.0 


17.8 


12.3 


3.8 


2.0 


1.4 


1.2 


500 


116.3 


84.8 


64.8 


53.5 


30.7 


11.4 


6.8 


4.1 


2.8 


750 


184.7 


187.6 


134.2 


120.0 


65.8 


29.1 


12.8 


7.3 


7.7 


1000 


307.7 


240.4 


244.7 


182.8 


133.2 


43.6 


24.5 


15.6 


11.6 


1500 


512.1 


487.5 


445.5 


359.9 


249.5 


90.9 


52.0 


32.6 


29.3 


FFBSi 






















300 


500 


750 


1000 


1500 










300 


1.2 


0.6 


0.5 


0.4 


0.2 










500 


2.1 


1.2 


0.8 


0.6 


0.4 










750 


3.7 


1.8 


1.4 


0.9 


0.6 










1000 


4.0 


2.7 


1.8 


1.3 


0.9 










1500 


7.3 


3.8 


3.1 


1.6 


1.4 











(a) conditionally independent and identically distributed given J-f 
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Number of observations T 




Number of observations T 

Figure 3: Empirical variance of the path-space (top) and FFBSi (bottom) for 
N = 300 (dotted Hne), N = 750 (dashed Hne) and N = 1500 (bold Hne) in the 
SVM. 



(b) centered conditionally to . 
where G^rph is defined in ([3| and C^j. is defined in (11). 
(ii) For any integers r, t and N : 



t,T-^|oo 



< p"''''''*"'''~''"*'°^osc(/i,) 



(18) 



where ST,r md p are respectively defined in ^ and in ^^^Q- 

(Hi) For all a; G X, — y-^ > — and * ^'"^ ^' > c_ — . 

Proof. The proof of ^ is given by jB] Lemma 3]. 

Proof of Let lis^r-.s.T be the operator which associates to any bounded 
and measurable function h on X''"'"^ the function ns_r s given, for any (a;o, . . . , xt) G 
X^+i, by 

IVs-t-.s^tK^O-.t) = h{Xs-r:s) ■ 
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Then, we may write ST,r = I]s=r ^s~r:s,Ths and Gfj^ST,r = G^.T^s-r:s,Th 
By ([3]), we have 



C^rpl{Xt) 



and, following the same lines as in [8, Lemma 10], 

\GfTlls-r:s,Ths\oo < p''"''~*osc(ft,s ) |£t,Tl |oo if t<s-r, 

\Gfj^Us^r:s,Ths\oo<P^^''0Sc{hs)\Ct^T'i-\oo if t>S, 

where p is defined in A[TJ|I|. Furthermore, for any s — r < t < s, 

\G^Tlls-r:s,Ths\oo < OSc{hs)\Ct,T'i-\oc , 



which shows (|ii| 



Proof of From the definition ([T0|, for all 2; € X and alH € {1, . . . , T}, 

T 

Ct^T^x) ^ j m{x,xt+i)gt+i{xt+i) J]^ M(a;„_i, dx„)5„(a;„)A(da;t+i) , 
hence, by assumption 

|'Cf,Tl|oo < J gt+l{xt+l)Ct+l^TMxt+i)\{dxt+l) 

^t,THx) > (T^ j gt+i{xt+i)Ct+i.T'^[xt+i)X{dxt+i) , 

which concludes the proof of the first statement. By construction, for any x G 
and any t £ {1, . . . , T}, 

Ct-i^Tlix)= J M{x,dx')gtix')Ct,THx') , 
and then, by assumption 



Ct-i,THx) 

|-Ct,Tl|oo 



Mix,dx')gt{x') \ ^ > c_ — 



□ 



Proposition 1. Assume For a// q > 2, there exists a constant C ( depend- 

ing only on q, (T_, a^, c_, supji^tjoo and snp\uj^\oo) such that for any T < 00, 

t>i t>o 

any integer r and any bounded and measurable functions {hs}J^^ on X'"+^, 



< -^y/TT^ (yTTV AVT-r + lj ^^osc(/is)2^ 



where D^rp is defined in ( 14 1 



1/2 



(19) 
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Proof. Since {D^rp[ST,r)} ^^^^rp is a is a forward martingale difference and q > 
2, Burkholder's inequality (see [13, Theorem 2.10, page 23]) states the existence 
of a constant C depending only on q such that: 



E 



Dt,TiST,r) 



t=0 



< CE 



Moreover, by application of the last statement of Lemma mliii 



Ct-1,T^ 



\Ct,Tl\c 



and thus. 



E 



t=0 



< 



a+ supj>o |t?t|oo^ ' 



E 



TV 



t.T 



t=0 



where a 



— ojf I ^ I — . By the Mmkowski mequality. 



t,T 



\Ct,Tl\a 



t=0 



AT 



't,T 



2/q- 



1/2 



(20) 



Since for any t >Q the random variables |aj | are conditionally indepen- 
dent and centered conditionally to J^l^i, using again the Burkholder and the 
Jensen inequalities we obtain 



E 



N 



ENJ 



e=i 



t-1 



< 



N 



< c 



E 



p 



i(t-S,S-T-t.^) 



osc{hs) 



7V«/2 , (21) 



where the last inequality comes from (18). Finally, by (20 I and (21 1 we get 

X 1/2 



y^-Pt/r('S'T,i 



t=0 



{T / T 
EE 
t=0 \s=r 



P 



K(t — s,s — r—t,0) 



osc{hs) 



18 



By the Holder inequality, we have 

T 



< 



max(t— s,s— r— i,0) 



1/2 , rp \ 1/2 

maii{t-s,s-r-t,0) ^2 



T \ 1/2 

max(i — s,s — r— i,0)^^^/ J, \2 



•'0Sc(/ls 



which yields 



1 + r) j ^ ^ osc(/is)^ j 



1/2 



We obtain similarly 

T 



t=0 



which concludes the proof. 



□ 



Proposition 2. ylssitme ^4[7]-[5[ For allq > 2, there exists a constant C (depend- 
ing only on q, a-, f7+, c_, sup|z?t|oo o-nd suplw^jooj such that for any T < +oo, 

t>i t>o 

any < t < T, any integer r, and any bounded and measurable functions 
{hsYs=r on W+\ 



C 



\CUSt.)\\,<^11p' 



max(i — s,s— r— i,0) 



0SC(/Is) , 



(22) 



where Cj^rp is defined in (151. 

Proof. According to (15 1, G^rp[ST,r) can be written 
where 



N 
t,T 



7V-if]f 



N 



N,l^t.T'i-{£,^'^) 



IA,t1| 



oo 



A,Ti(gr) 

|-^t,Tl|oo 



E 



N.l Ct,TUit'^) 
|£t,TlU 



|£t,Tl|o 



(23) 
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and where is defined by (|5| . Using the last statement of Lemma [T] we get 
the following bound: 



E 



IA,t1| 



1 [A-I.tI/IA.tIIoo] 



> 



which implies 



(24) 



Then, \C^rp(^ST.r)\ < C iVtrl ^"^^ '^^'^ decomposition 



AT— 1 V~>-'V Af,£ AT— 1 V^iV r 



1 v^Af N.e 



E 



17f 



17f E 



E 



N 



where a,^^' = Lo^'' '''fc'7i\t' ^ ^nd f2f N^^n^^ . By (jTs]), E 



and then, by A 1 A3 

1 



N.l 



t-l 



and (18), 



E 



f7f 



< and F W < C 



t,T ^ ^ Piloo „max(t-s,s-r-t,0) 



17fE 



T 

Hp- 



Therefore, |C/^3,(5T,r)| < C (c};^ + ^J^, , 



.max(t— s,s— r— i,0)j^t,i 



osc(/is)Cj^''^ ) where 



N 



l,Ar d£f -j^jv 
t,T 



y& ■ N-^Y.'^t.T and C, 



-,2,^ dof , ,jv 



E 



N 



N 



( NIC Id"'') 1 ^ 

The random variables I ' |'^^ ^^^j — -j being bounded and conditionally 

independent given following the same steps as in the proof of Proposition 

3 there exists a constant C (depending only on q, (t_, c_ and supjcj^oo) 
t>o 

such that IIVt^L <CN~^/'^. Similarly 



I2g 



AT 



< C 



J2s=r P 



max(t — s,s— r— 1,0) 



osc{hs) 



2q 



and 



E 



17 



- n 



N 



Ari/2 



C 

2q - 7V1/2 



The Cauchy-Schwarz inequality concludes the proof of ( 22 1 . 



□ 
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The proof of Theoremjljis now concluded for the FFBS estimator ip^.j^^rp [S't.t-] 
and we can proceed to the proof for the_FFBSi estimator. We preface the proof 
1 for the FFBSi estimator (/>^y|y by the following Lemma. We first 



of Theorem |1 1 for the FFBSi estimator (/>^y|y 
define the backward filtration {Q^t}^-q 



^g^^ =^ J'^Va{ji,l<e<N,t<u<T}, V < e {0, . . . , T} . 



Lemma 2. Assume ^"^j^H^j Let £ e {1, . . . , N} and T < +oo. For any bounded 
measurable function h on lU"^^ we have, 



(i) for all u, t such that r < t < u < T , 



E 



N 



yu. 



-E 



yu+1 



< p"^*osc(/i) 



where p is defined in j4[7||i|. 

(ii) for all u, t such that t — r<u<t— 1<T, 



E 



N 



E 



k ( 



yu+i,T 



< osc(ft-) 



Proof According to Section 2.2 for all i G {!,..., A^}, {J^'K^a is an 



homogeneous Markov chain evolving backward in time with backward kernel 
{A^}^::^. For any r <<< It < T, we have 



E 



h ( 



->N 



E 



rN 

yu+i 



+ 1.T 



E 



Lm=T 



t+l 



t-r+1 



Jt-r:t-l i=t 



The RHS of this equation is the difference between two expectations started with 
two different initial distributions. Under A[l]|i| , the backward kernel satisfies the 
uniform Doeblin condition, 



V(*,j)e {!,..., A^r Af(*,j)> 



and the proof is completed by the exponential forgetting of the backward kernel 
(see in [7]). The proof of ([u]) follows exactly the same lines. □ 
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To compute an upper-bound for the L^-mean error of the FFBSi algorithm, 
we may define the difference between the FFBS and the FFBSi estimators: 



drp [Sr.r] — 4>{):T\T [^T,r] — 4>0:T\T [^T,r\ 



(25) 



Proof of Theorem^ for the FFBSi estimator. The difference between the FFBS 
and the FFBSi estimators, , defined in (25), can be written 



NT , 
1 v-.r- . ( -N,J^2l, 



e=i t=r ^ 
NT T 

^—1 t—r u—t~r 
N T 
_ \ " \ " /-N/ 
TV ^ ^ ^" ' 



E 



N 



t- 
>t-r:t 



N 



E 



t~ 



1=1 u=0 



where 



^N,i del 



{u+r)AT 



N 



E 



N,J" 



^N 



-■N 



-1,T 



For alH e {1, . . . , N} and all u e {0, . . . , T}, the random variable C^'^ is g^j,- 
measurable and E [Cu'^IGu+i^t] = so that C^'^ can be seen as the increment 
of a backward martingale. Hence, since q > 2, using the Burkholder inequality 
(see |13l Theorem 2.10, page 23]), there exists a constant C (depending only on 

q, (T_, <T+, c_, sup|i?t|oo and supjajjoo) such that: 
t>i t>o 



\S^[ST,r]\\^<C\j2E 



u=0 



N 



2/9 >! 



(26) 



Then, since the random variables {C^'^j^Li are conditionally independent and 
centered conditionally to G^+i T' using the Burkholder inequality once again 
implies: 



E 



N 



rN 

yu+i,T 



< 



N 



^=1 



-rN 

^u+l,T 



(27) 
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Furthermore, according to Lemma pHi 



E 

{u+r)AT 



t+r) 

E 



-E 



N 7™-* 



t=ii+l 



E 



N 



yu. 



-E 



, N 



yu+i: 



u (u+r)AT 

< ^p"-*osc(/it) + osc{ht) . (28) 

t=r t=u+l 



Putting (261, (271 and (281 together leads to 



T I {u+r)l\T 

E E / 

u=0 \ t=r 



1/2 



(«-t)VO 



osc{ht 



1/2 



Using the Holder inequality as in the proof of Proposition [T] yields 

and the proof of Theorem[T|for the FFBSi estimator is derived from the triangle 
inequality: 

(f'O-.TlT (Sr.r) — 4'o-T\T i^T,r) < 11 [S'r.r] 11 „+ 11 [S'T.r] 11 „ , 
' q 

where 1^^[St.t] is defined by ^ and 5^ [St A is defined by ([25|. □ 

6 Proof of Theorem [2] 

We preface the proof of the Theorem by showing that the martingale term of 
the error A^[S'T,r] (which is defined by (|9|) satisfies an exponential deviation 
inequality in the following Proposition. 



Proposition 3. Assume There exists a constant C (depending only on 

(T_, c_, sup|?9f|oo o,nd sup|aj(|ooj such that for any T < oo, any N > I, 
t>i t>o 

any e > 0, any integer r and any bounded and measurable functions {hs}J^r on 



> £ > < 2 exp 



CNe^ 



(29) 



where D^rp is defined in (14 1 and Qr.T is defined by (17 1. 
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Proof. According to the definition of D^rp[ST,r) given in (14 1, we can write 



N{T+l) 



7^ 



t=0 



k=l 



where for alH e {0, . . . , T} and ^ e {1, . . . , TV}, 



-'Nt+i 



is defined by 



-1 N,i'^t',T^T,r(it 



t,T^\oa 



and is bounded by (see (18l) 



Furthermore, we define the filtration {"Hf j^^^ for all t e {0, . . . ,T} and 
€e{l,...,iV}, by: 

with the convention J^i^j = (t{Yq.t)- Then, according to LemniajlJ {'Uk}^^^^^ 

is martingale increment for the filtration {H^ }k=i^^^ ^^'^ ^^^^ Azuma-Hoeffding 
inequality completes the proof. □ 



Proposition 4. Assume There exists a constant C ( depending only on 

a-, CT+, C-, sup|i?t|oo o,nd sup|cjf|oo^ such that for any T < oo, any N > I, 
t>i t>o 

any e > 0, any integer r and any bounded and measurable functions {hs}J^^ on 
wr+i 



> e > < 8 exp - 



CNe 



(l + OE.=.osc(M^ 



(30) 



where C^rp{F) is defined in ([15 



Proof. In order to apply Lemma [4] in the appendix, we first need to find an 
exponential deviation inequality for C/^(5'T,r) which is done by using the de- 
composition C^rp[ST,r) = UlipV/ipWl^rp given in (23). First, the ratio U^ip is 
dealt with through Lemma |3] in the appendix by defining 



aN = N -'X.fcl^t G'lj.ST,r{.£.t )/|A,t1 
, del N,t 

On = N ' l^f^-^ , 
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Assumption and shows that h > j3 and (18 1 shows that \aN/hN\ < 

C(l + r) max |osc(ft,f)|. Therefore, Condition (I) of Lemma 

r<t<T 

bounds < < |a;f|oo and the HoefFding inequahty lead to 



V[\bN - 6| > e] = E 



N- 



N 

-1 ( N,l „\ JV.li -r-N ■ 

1=1 



> £ 



-pN 



< 2 exp 



2Ne' 



estabhshing Condition (ii) in Lemma |3] Finally, Lemma [Tj|i| and the Hoeffding 
inequality imply that 



I aw I > e] = E 



N 



> e 



t-i 



< 2exp - 



Lemma [3] therefore yields 

P{|?/t^T| > e} < 2 exp 



CNe 



Then V{ip is dealt with by using again the HoefFding inequality and the bounds 



< b'^f < I'^tloo) where ^ 



N,i dcf N,eCt,Tl{^t- ) . 



|£t,Tl|o 



AT 



> e 



N 



> e 


•'t-1 











< 2exp{~CNe^) 



Finally, W^ip has been shown in (24) to be bounded by a constant depending 
only on cr_, (T+, c_, sup|i9t|oo and sup|Wi|oo: < C so that 



t>i 



{\C^TiST,r)\ > e} < PlIf/fT^t^l > e/C} < P{Kt| > £«}+P > ^v} 



where 



dot 



. e^pni'^'^(*-'*'^-'-*'0)osc(/is)/C and e, 

\ s— r 



dof 
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Therefore, 

V {\C^T{ST.r)\ >£} < 4exp 



CNe 



The proof of ( 30 1 is finally completed by applying Lemma |4] with 

CN 



Xt = CMST,r) , A = 4 , Bt 



7 = 1/2. 



□ 



Proof of Theorem^ for the FFBS estimator. The result is obtained by writing 



{\A^[ST,r]\>s} < 



> s/2 }+¥■ 



> e/2 



and using ( 29 1 and pO^ 



□ 



Proof of Theorem^ for the FFBSi estimator. We recall the decomposition used 
in the proof of Theorem [l] for the FFBSi estimator: 

N T 



= 1 u=0 



where 5^ [ST,r] is defined by (251. Since {Cu'^}^_i are G^t nieasurable and 
centered conditionally to G^+i t using the same steps as in the proof of Propo- 
sition |3] we get 



{\S^ [ST,r] \ > e} < 2exp 



CNe^ 



where O^.t is defined by (17 1. The proof is finally completed by writing 

0O:T|T [Sx.r] — 4'0:T\T [^T,r] = A^[S'T,r] + [Sr^r] , 

and by using Theorem |2] for the FFBS estimator. 



□ 



A Technical results 

Lemma 3. Assume that apf, bpf, and b are random variables defined on the 
same probability space such that there exist positive constants j3, B, C , and M 
satisfying 

(i) \aM/bM\ < M, V-a.s. and b> (5, V-a.s., 
(ii) For alle>0 and all > 1, P [\bN - 6| > e] < Be-^^'\ 
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(Hi) For alle>0 and all > 1, P [\aN\ > e] < Bq-^^'-'/^^^ 
Then, 



UN 



■JN 



> e ^ < Bexp --CN 



V 2M 



Proof. See [8 , Lemma 4] . 



□ 



Lemma 4. For T>0, let {Xtjf^^ be (T + 1) rando m variables. Assume that 
there exists a constants A > 1 and for all < t < T , there exists a constant 
Bt > such that and all e > 



\Xt\ >e}<Ae 



-Bte 



Then, for aZZ < 7 < 1 and all e > 0, we have 
p| >£!> < 



A 



-7Be/(T+l) 



whe 



t=o 



B = 



1-7 



T 



t=o I 



Proof. By the Bienayme-Tchebychev inequality, we have 



> £ > = F < cxp 



7B 

r + 1 



> e 



exp 



7Be/(T+l) 
7B 



r + 1 



E^* 

t=0 



(31) 



It remains to bound the expectation in the RHS of (31 ) by A(\ — 7) ^ . First, 
by the Minkowski inequaHty, 



E 



exp 



7B 
T + 1 



E^* 

t=0 



^^g!(T+l). 



E^* 



t=o 



^i+E,!(i^(EM, 

Moreover, for q > 1, E can be bounded by 



E 



1/9 , 



i3« ' 



Finally, 



E 



exp 



T+l 



E^* 



t=0 



g=0 



(1-7) 



□ 
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